Abstract: In this paper, the effect of modified feature selection method and re-occurrences of features on performance of multi-label associative classifier is studied. In the proposed approach, important words (keywords) from each document in the training dataset are selected by using two methods. One is to select words having mutual Information (MI) value greater than given threshold as keywords and second is to select a limited number of words from a document having maximum values of MI as keywords. The method to select keywords is decided by comparing the maximum MI value of word from a document with limit value. If the maximum MI value is greater than limit value then first method is used for keyword selection, otherwise second method is used for keyword selection. This method ensures that keywords are selected from each and every document and unnecessary keywords are avoided. Association rules are generated by using the extracted keywords. Re-occurrences of features i.e. keywords are considered while calculation of supports of rules. In the proposed approach, multiple minimum support threshold method is used for rule pruning to handle the rare class problem. The classifier assigns multiple labels for a single document. If no label is found for a document from the generated rules, then the class label with highest support in the dataset is assigned to the document. The classifier built by using the proposed approach provides good accuracy as compared to traditional associative classifiers.

Keywords: Multi-label associative classifier, text classification, multiple minimum support thresholds, mutual information, association rules.